Abstract 

An optimum solution free from degeneration is found to the system 
of linear algebraic equations with empirical coefficients and right-hand 
sides. The quadratic risk of estimators of the unknown solution vector 
is minimized over a class of linear systems with given square norm of 
the coefficient matrix and length of the right-hand side vector. Empirical 
coefficients and right-hand sides are assumed to be independent and nor- 
mal with known variance. It is found that the optimal estimator has the 
form of a regularized minimum square solution with an extension multi- 
ple. A simple formula is derived showing explicitly the dependence of the 
minimal risk on parameters. 

1 Introduction 

The standard solution to systems of linear equations with random coefficients 
i?x = y, where R is the coefficient matrix, and x is a vector of unknowns can 
be unstable or may not exist if the variance of coefhcients is sufficiently large. 
This can be caused by an incorrect solution and by the inconsistency of the 
system of random equations. The regularization by A.N.Tikhonov Q implies 
an artificial requirement of minimal complexity; this procedure guaranties some 
solution, but this solution provides neither the minimum of the quadratic risk, 
nor the minimum of residuals. 

The estimation of unknowns over a single observation of R and y by the 
minimum square method leads to the solution x — (R R)^ R y, which also 
can be unstable or non-existing. The confluent analysis methods |2] provide 
solutions of the form x — (R R— AI)^ R y, where A > 0, and / is the identity 
matrix. These estimates are even more sensible to the coefficients inaccuracy 
and certainly do not exist when the minimum square solution does not exist 
(that is related to the necessity of a simultaneous additional estimation of the 
matrix coefflcients). The extremum problem that is solved in the confluent 
analysis is based on the application of the maximum likelihood estimations. 
However, it is well known that the maximum likelihood estimators can have 
the quadratic risk substantially greater than the minimal one if the number of 
parameters is comparable with the sample size. 

One can expect that the a priori information on the distribution of random 
coefficients and right-hand sides can be used to improve solutions, and even by 
a single observation of the coefficient matrix if the systems of equations is large 
and random inaccuracies of the coefficients are independent. 

In paper ||^ and monograph by V.L.Girko, the problem of solutions to 
large systems of random equations is investigated. It is proved that the standard 
minimum square method using the inversion of empirical coefficient matrix leads 
to solutions with a substantial bias. The author suggests asymptotically unbi- 
ased estimators of the unknowns vector (G-estimators). However, the quadratic 
risk of these unbiased estimators also proves to be not minimal. 
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In 1^, the asymptotic approach of was apphed to find an asymptoti- 
cally unimprovable linear operator solving systems of random linear equations 
of increa-sing dimension. However, the possibilities to decrease the quadratic 
risk of solutions for finite systems were not yet investigated. 

In this paper, we obtain an estimator of the solution vector minimizing the 
square risk on the average for a class of systems with arbitrary fixed numbers 
of equations and unknowns. 

2 Optimal solution 

Let us find a general form of an unimprovable solution in the Bayes approach 
by averaging over possible equations and their solutions. 

Suppose a consistent system of linear equations is given Ayi = b, where A 
is the coefficient matrix of N rows and n columns, A'^ > n, x is the unknown 
vector of the dimension n, and b is the right-hand side vector of the dimension 
A''. The observer only knows the matrix R = A + SA, and vector y = h + Sh, 
where SA are matrices with random entries SAij ^ N(0, p/n) and 5h are vectors 
with components 6bi ^ N(0,q/n). Assume that all these random values are 
independent, z = 1, . . . , A^, j = 1, . . . ,n. 

We construct a solution procedure that is unimprovable on the average for 
a set of problems with different matrices A and vectors b, and consider Bayes 
distributions Aij ~ N(0,a/n) and xj ~ N(0, 1/n) for all i and j assuming all 
random values independent. It corresponds to a uniform distribution of entries 
of A given the quadratic norm distributed as and a uniform distribution of 
directions of vectors x. We minimize the quadratic risk of the estimator x 

D = D{x) ^F, {x~xf. (1) 

Here (and in the following) the square of a vector denotes the square of its 
length, and the expectation is calculated over the joint distribution of A, SA, x, 
and Sh. 

Let us restrict ourselves with a class K of regularized pseudo solutions of the 
form 

X - ri?^y, (2) 

where the matrix F = r{R^R) is diagonalized together with R^R and has 
eigenva-lues 7(A) corresponding to the eigenvalues A of R^R. Restrict functions 
7(u) by the requirement that it is a non- negative measurable function such that 
the product u{l + u)j'^(u) is bounded by a constant. 
Note that, for the standard solution, 7(71) — 1/u & K. 

Define a non-random distribution function for the set of eigenvalues of the 
empirical matrices R^ R as the expectation 

n 

F{u) = E n"^ ind(Ai < u), 
i=l 
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where Ai are eigenvalues of R R. Using this function wc can write, for example, 

n 

E n-ltr(Rr2RT) = n^^E Xl^i^^(^i) " / U7^(u)dF(u). 

i=l 

Theorem 1 Let an estimator x e K of the solution |^ to the system A:x. = 
b be calculated using the observed empiric matrix R = A + SA and vector 
y = h + 6h, where entries of the matrices A — {Aij} and 6 A = {SAij} and 
components of the vectors x = {■'^j} ~ {^t>j} are independent random 

values 

^N(0,a/n), xj ^ N(0, 1/n)), ^Ay - N(0, p/n), ^bj ^ N(0, q/n), 
i = 1, . . . , N, j — 1, . . . ,n, where a > 0. Then the functional |l] equals 

D{x)= I [l~2eu-i{u) + e'^u'^-f'^{u) + su-i'^{u)\ dF{u), (3) 



where 

^ a ap 

s = 



a-\-p a + p 

Note that the expression |^ is quadratic with respect to 7(w) and allows the 
standard minimization. 

Corollary Let a > 0. The expression || reaches the minimum in the class 
K for 7(w) = 7opt(u) = e/{e''u + s). 



and the minimal value of |l| is 



3 Discussion 



Thus we can draw a conclusion that the solution of empirical linear algebraic 
equations with normal errors can be stabilized and made more accurate by re- 
placing standard minimum square solution by an optimal on the average regular 
estimator 

Si = a{R^R + tI)-^R^y (4) 

with a "ridge" -parameter t — s/O"^ and a scalar scaling coefficient a = 1/9. 
Here a > 1 presents an extension coefficient in contrast to shrinkage multiples 
for well-known Stein estimators. The matrix a{R^ R + tl)^^ in ^ is, actually, a 
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scaled and regularized estimator of {A^A)'-^ optimal with respect to the min- 
imization of D. Such solutions are characteristic for a number of extremum 
problems solved in the theory of high-dimensional multivariate statistical anal- 
ysis [|. 

We can compare the optimal value Z3opt with the square risk of the standard 
solution Dstd if we set 7(u) = 1/u in ||, 

D,td^ l[{l-Of + -] dF{u). (5) 

It is easy to prove that the difference Dgtd ~ £'opt > and equals for q ^ 
and p = 0. 

For p = and q > 0, we have 9—1, the matrix R is non-random, a = 
1, t ~ q and Dopt is less Dstd due to the factor u/{u + q) in the integrals over 
dF{u). 

If p > or g > 0, the expression ^ contains J u^^dF{u) with a non-zero 
coefficient. Note that matrices W — R are the Wishart n x n matrices, 
calculated for variables distributed as N(0, (a -I- p)N/n) over a sample of size 
N. The integral J u^^dF{u) — E n^-'-trW"-'-, where the right-hand is infinite 
for n > N. Thus, for n = N the quadratic risk of standard solution is infinitely 
large while I?opt = 1 for the optimal solution. 

Proof of the theorem 1. 

Note that _D is a sum of three addends 

D = E x'^x 2E x^rR'^y + E y'^RF^R'^y. 

Let angular brackets denote the normed trace of a matrix: (M) — n^^trAI. 
First, we average over the distribution of Sh. Let us use the obvious property 
of normal distributions 

E (Sh^MSh) = qn-^trM, 

where M is an x matrix of constants. We find that 

D = E x'^x 2E x'^rR'^b + E b'^RF^R'^b + qE (RF^r'^). 

We average with respect to the distribution of x. In view of the equality 
b — Ax we have 

£> = 1 - 2E (Fr'^A) + E (a'^RF^r'^A) + qE (RF^r'^}. (6) 

In the further transformations, we use a simple property of normal distributions: 
if a random value r is normally distributed with zero average and the variance 
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a^, then for any differentiable function /(■), we have 

Erf(r)=a2E|^. (7) 

For example, consider the second term in |[ Apply ^ to the normal variable A. 
We obtain 

E (FrTa) = n 1e Aij(rRT)j. = -Je ^X^T^' 

where (and in the following) the summation over repeated indexes is implied. 

Note that all entries of the random matrix R have the joint variance (a+p) jn. 
Therefore, 

E(rRTR) = n-lERy(rRT,j,.S±E?<^^ 

Let a > 0. Comparing these two expressions we find that 

E (FR'^A) =a(a + p)-lE (rR^R) 
Similarly, one can derive the relation 

E (r'^RF^rTa) =a(a + p)^E (R^RF^rTr) 
Once again, we use |^ with respect to the random A and obtain: 

E (aTrf^rTa) = -^E A,: — ^ +aE (RF^rT), 

where (and in the following) fc = 1, . . . , iV. Further, we use ^ with respect to 
bA and obtain 

a/'Rp2 j^T\ 
E (aTrf^rT^a) = -^E A:i— — — 

Adding two latter equalities we have 

E (A^RF^rTr) = ^-LEe Aj; — ^ + aE (RF^rT) 

Now we combine the five latter equalities and obtain 

2 

E (A^RF^rTa) = ^E (rTrf^rTr) + (RF^rT). (8) 

(a + p)^ a + p 



These relations are suiBcient to express D in terms of the observed variables 
only. Substitute ^ to ||. We have 

2 

D = 1-2— (rRTR} + -E (RTRr2RTR)+sE (RF^rT). (9) 

a + P (a + p)^ 

Passing to the (random) system of coordinates in which R is diagonal we can 
replace formally 

n ^ 

n-i^^(A,)= / v{u)dF{u), 

i—l 

where Ai are eigenvalues of R^ R. The expression ^ follows from ^. The proof 
of Theorem 1 is complete. 
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